Sociology 229: Advanced Regression Models
Short Assignment #2: Count Models
Due: Start of class (9:00) January 26
This assignment requires a dataset on the course website entitled “Assignment 2 Count Data.dta”. The dataset includes information on approximately 2500 people from the GSS. The dataset includes a variable “memnum” which is the sum of a series of dummy variables indicating individual membership in different types of voluntary associations such as school organizations, religious groups, and sport/hobby associations. It isn’t exactly equivalent to the total number of memberships in associations that an individual has – because some people might join more than one association of a given type – but it is close, so for the purposes of the assignment you can describe it as total memberships. Individuals who have many memberships in voluntary organizations are often said to be “civically engaged” and communities with lots of memberships are believed to have high levels of cultural capital.
Notes: Education is measured in years. TV watching is measured in hours per day (on average).
Question 1: Based on the overdispersion parameter alpha, which model was preferred – the poisson regression or the negative binomial model? Were the results similar overall? Did the choice of models affect any conclusions you might draw from the analysis?
Question 2: Interpret the coefficients for education (measured in years) and television viewing time (hours per day). Discuss the raw coefficient, the incidence rate ratio (which is analogous to an odds ratio), and the % difference in incidence rate. (Note: since the model involves constant exposure, which is often the case, you can use the word “count” instead of “incidence rate” in describing results. If exposure varies, you should use the term “rate” rather than “count”.)
Question 3: Discuss the impact of education and viewing on membership in associations, based on predicted probabilities computed above (one of which is for a hypothetical case, and one of which is for the “average” case).
Question 4: Comment on the results from the zero-inflated negative binomial model. (Note that a the “inflate” equation predicts zeros, so a negative coefficient corresponds to a positive impact on being non-zero. So, you need to ‘flip’ signs to interpret results consistently with the count model.) Often the effects are consistent between both models, but sometimes a variable mainly affects the count equation or the inflation equation only. Does this happen? Suggest an interpretation.
Turn in the following: